Outlier overcoming using re-sampling techniques

نویسندگان

  • Bruno Baruque
  • Bogdan Gabrys
  • Emilio Corchado
  • Álvaro Herrero
  • Jordi Rovira
  • Javier Gonzalez
چکیده

Machine learning has extensively and successfully used statistical resampling techniques for generation of classifier and predictor ensembles. It has been frequently shown that combining so called unstable predictors has a stabilizing effect on and improves the performance of the prediction system generated in this way. In this paper we use the re-sampling techniques in the context of Principal Component Analysis (PCA). We show that the proposed PCA ensembles exhibit a much more robust behaviour in the presence of outliers which can seriously affect the performance of an individual PCA algorithm. The performance and characteristics of the proposed approaches are illustrated on a number of experimental studies where an individual PCA is compared to the introduced PCA ensemble.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overcoming Limitations of Sampling for Aggregation Queries

We study the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To address this issue, we introduce a technique called outlier-indexing. Uniform sampling is also ineffective for queries with low selectivity. We rely on weighted sampling based on workload information ...

متن کامل

Bayesian analysis of outlier problems using the Gibbs sampler

We consider the Bayesian analysis of outlier models. We show that the Gibbs sampler brings considerable conceptual and computational simplicity to the problem of calculating posterior marginals. Although other techniques for finding posterior marginals are available, the Gibbs sampling approach is notable for its ease of implementation. Allowing the probability of an outlier to he unknown intro...

متن کامل

Classification for Imbalanced and Overlapping Classes Using Outlier Detection and Sampling Techniques

In many real world applications, the example data among different pattern classes are imbalanced and overlapping, which hinder the classification performance of many learning algorithms. In this paper, data cleaning techniques based BNF (the borderline noise factor) is proposed to remove the borderline noise and three under-sampling methods are studied to select the representative majority clas...

متن کامل

Outlier Resistant PCA Ensembles

Statistical re-sampling techniques have been used extensively and successfully in the machine learning approaches for generation of classifier and predictor ensembles. It has been frequently shown that combining so called unstable predictors has a stabilizing effect on and improves the performance of the prediction system generated in this way. In this paper we use the resampling techniques in ...

متن کامل

Rapid Distance-Based Outlier Detection via Sampling

Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006